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Abstract 

We propose a general framework for reconstructing and denoising single entries of incom- 
plete and noisy entries. We describe: effective algorithms for deciding if and entry can be 
reconstructed and, if so, for reconstructing and denoising it; and a priori bounds on the error 
of each entry, individually. In the noiseless case our algorithm is exact. For rank-one ma- 
trices, the new algorithm is fast, admits a highly-parallel implementation, and produces an 
error minimizing estimate that is qualitatively close to our theoretical and the state-of-the-are 
Nuclear Norm and OptSpace methods. 

1. Introduction 

Matrix Completion is the task to reconstruct low-rank matrices from a subset of its entries and 
occurs naturally in many practically relevant problems, such as missing feature imputation, multi- 
task learning ( Argyriou et al.| 2008) , transductive learning ( ,Goldberg et al.,,2010j , or coUabo 



rative filtering and link prediction ( Srebro et al.[ 2005| Acar et al. 2009^ Menon and Elkan 
20TT] ). 

Almost all known methods performing matrix completion are optimization methods such 



as the max-norm and nuclear norm heuristics (Srebro et al. 2005 Candes and Recht 2009 



Tomioka et al. 2010 1, or OptSpace ( [Keshavan et al.| 2010 ), to name a few amongst many. 



These methods have in common that in general (a) they reconstruct the whole matrix and (b) 
error bounds are given for all of the matrix, not single entries. These two properties of existing 
methods are in particular unsatisfactorjj^in the scenario when one is interested only in predicting 
(rasp, imputing) one single missing entry or a set of interesting missing entries instead of all - 
which is for real data a more natural task than imputing all missing entries, in particular in the 
presence of large scale data (resp. big data). 

Indeed the design of such a method is not only desirable but also feasible, as the results 



of Kiraly et al. (20121 suggest by relating algebraic combinatorial properties and the low-rank 
setting to the reconstructability of the data. Namely, the authors provide algorithms which can 
decide for one entry if it can be - in principle - reconstructed or not, thus yielding a statement of 
trustability for the output of any algorithirj^ 



'Machine Learning Group, TU-Berlin,'f ranz . j .kiralyOtu-berlin.del 
^Discrete Geometry Group, FU Berlin, theran@math.fu-berlin.de 

^While the existing methods may be applied to a submatrix, it is always at the cost of accuracy if the data is sparse, 
and they do not yield statements on single entries. 

^The authors also provide an algorithm for reconstructing some missing entries in the arbitrary rank case, but 
without obtaining global or entry-wise error bounds, or a strategy to reconstruct all reconstructible entries. 
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In this paper, we demonstrate the first time how algebraic combinatorial techniques, com- 
bined with stochastic error minimization, can be applied to (a) reconstruct single missing entries 
of a matrix and (b) provide lower variance bounds for the error of any algorithm resp. estimator 
for that particular entry - where the error bound can be obtained without actually reconstructing 
the entry in question. In detail, our contributions include: 

• the construction of a variance-minimal and unbiased estimator for any fixed missing entry 
of a rank-one-matrix, under the assumption of known noise variances 

• an explicit form for the variance of that estimator, being a lower bound for the variance of 
any unbiased estimation of any fixed missing entry and thus yielding a quantiative measure 
on the trustability of that entry reconstructed from any algorithm 

• the description of a strategy to generalize the above to any rank 

• comparison of the estimator with two state-of-the-art optimization algorithms (OptSpace 
and nuclear norm), and error assessment of the three matrix completion methods with the 
variance bound 

Note that most of the methods and algorithms presented in this paper restrict to rank one. This 
is not, however, inherent in the overall scheme, which is general. We depend on rank one only in 
the sense that we understand the combinatorial-algebraic structure of rank-one-matrix comple- 
tion exactly, whereas the behavior in higher rank is not yet as well understood. Nonetheless, it 
is, in principle accessible, and, once available will can be "plugged in" to the results here without 
changing the complexity much. 



2. The Algebraic Combinatorics of Matrix Completion 



2.1. A review of known facts In Kiraly et al. (20121, an intricate connection between the 



algebraic combinatorial structure, asymptotics of graphs and analytical reconstruction bounds 
has been exposed. We will refine some of the theoretical concepts presented in that paper which 
will allow us to construct the entry-wise estimator. 

Definition 2.1 An matrix M e {0, 1}'"^" is called mask. If A is a partially known matrix, then the 
mask of A is the mask which has 1-s in exactly the positions which are known in A; and 0-s otherwise. 

Definition 2.2 Let M be an (m x n) mask. We will call the unique bipartite graph G(M) which has 
M as bipartite adjacency matrix the completion graph of M. We will refer to the m vertices o/G(M) 
corresponding to the rows of M as blue vertices, and to the n vertices o/G(M) corresponding to the 
columns as red vertices. If e = (t, j) is an edge in iC^, „ (where iC^ „ is the complete bipartite graph 
with m blue and n red vertices), we will also write Ag instead ofA^j and for any (m x n) matrix A 



A fundamental result, ( Kiraly et al.[ 2012 Theorem 2.3.5), says that identifiability and recon- 
structability are, up to a null set, graph properties. 

Theorem 2.3 Let Abe a generi^and partially known (m x n) matrix of rank r, let M be the mask 
of A, let i,j be integers. Whether A^j is reconstructible (uniquely, or up to finite choice) depends only 
on M and the true rank r; in particular, it does not depend on the true A 



^In particular, if A is sampled from a cominuous density, then the set of non-generic A is a null set. 
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For rank one, as opposed to higher rank, the set of reconstructible entries is easily obtainable 
from G(M) by combinatorial means: 



Theorem 2.4 (( [Kiraly et ari|2012| Theorem 2.5.36 (i))) Let G c JC^^ be the completion graph 
of a partially known (m x n) matrix A Then the set of uniquely reconstructible entries of A is exactly 
the setA^, with e in the transitive closure of G. In particular, all of A is reconstructible if and only if 
G is connected. 



2.2. Reconstruction on the transitive closure We extend Theorem 2.4 s theoretical recon- 
struction guarantee by describing an explicit, algebraic algorithm for actually doing the recon- 
struction. This algorithm will be the basis of an entry-wise, variance-optimal estimator in the 
noisy case. In any rank, such a reconstruction rule can be obtained by exposing equations which 
explicitly give known and unknown entries in terms of only known entries due to the fact that the 
set of low-rank matrices is an irreducible variety (the common vanishing locus of finitely many 
polynomial equations). We are able to derive the reconstruction equations for rank one. 

Definition 2.5 Let P c „ (resp. C c iC^ „ j be a path (resp. cycle), with a fixed start and end 
(resp. traversal order). We will denote by E~^[P) be the set of edges in P (resp. £^(C) and C) 
traversedfrom blue vertex to a red one, and by £~(P) the set of edges traversedfrom a red vertex to 
a blue one |^ From now on, when we speak of "oriented paths" or "oriented cycles", we mean with 
this sign convention and some fixed traversal order 

Let A = Aij be a (mx n) matrix of rank 1, and identify the entries Aij with the edges of iC^n „. 
For an oriented cycle C, we define the polynomiah 

pdA)= n n 

ee£+(C) ee£-(C) 

LdA)= ^ logA, - ^ logAe, 

ee£+(C) ee£-(C) 

where for negative entries of A, we fix a branch of the complex logarithm. 

Theorem 2.6 LetA = A^ be a generic [m x n) matrix of rank 1. Let C c be an oriented cycle. 
Then, Pc(A) = Lc(A) = 0. 

Proof: The determinantal ideal of rank one is a binomial ideal generated by the (2 x 2) minors of 
A (where entries of A are considered as variables). The minor equations are exactly Pc(.A), where 
C is an elementary oriented four-cycle; if C is an elementary 4-cycle, denote its edges by a(C), 
b(C), c(C), d(C), with £^(C) = {a(C), d(C)}. Let C be the collection of the elementary 4-cycles, 
and define Le(A) = {Lc(A) : C e 6} and PgCA) = {Pc(A) : C e 6}. 

By sending the term logAg to a formal variable x^, we see that the free Z-group generated by 
the Lc(A) is isomorphic to Hi(i<C^ With this equivalence, it is straightforward that, for any 
oriented cycle D, L^^A) lies in the Z-span of elements of Lq(A) and, therefore, formally. 



Cee 



'^This is equivalent to fixing tlie orientation of K^ „ that directs all edges from blue to red, and then taking £^(P) to 
be the set of edges traversed forwards and £~(P) the set of edges traversed backwards. This convention is convenient 
notationally, but any initial orientation of „ will give us the same result. 
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with the etc ^ ^- Thus L^^-) vanishes when A is rank one, since the r.h.s. does. Exponentiating, 
we see that 



f 



n - 

\^ee£+(D) J 



f 



n ^ 

\^ee£-(D) J 



n (^a(C)^d(C)^fe(c)^c(C)) 



Cee 



If A is generic and rank one, the r.h.s. evaluates to one, implying that P^iA) vanishes. 



□ 
Let 



Corollary 2.7 Let A = Ajj be a (m x n) matrix of rank 1. Let v, w be two vertices in K„ 
P, Q be two oriented paths in K^ ,^ starting at v and ending at w. Then, for all A, it holds that 
Lp(A) = Lq(A). 

Remark 2.8 It is possible to prove that the set of Pq forms the set of polynomials vanishing on the 
entries of A which is minimal with respect to certain properties. Namely, the Pq form a universal 
Grobner basis for the determinantal ideal of rank 1, which implies the converse of Theorem 2.6 From 
this, one can deduce that the estimators presented in section 3.2 are variance-minimal amongst all 
unbiased ones. 



3. A Combinatorial Algebraic Estimate for Missing Entries and Their Error 

In this section, we will construct an estimator for matrix completion which (a) is able to complete 
single missing entries and (b) gives universal error estimates for that entry that are independent 
of the reconstruction algorithm. 



3.1. The sampling model In all of the following, we will assume that the observations arise 
from the following sampling process: 

Assumption 3.1 There is an unknown fixed, rank one, matrix A which is generic, and an (m x n) 
mask M e {0, 1}'"^" which is known. There is a (stochastic) noise matrix £ e M'"'^" whose entries 
are uncorrelated and which is multiplicatively centered with finite variance, non-zer(]^variance; i.e., 
E(log £;j) = and < Var(log g^j) < oo for all i and j. 

The observed data is the matrix Ao M o £ = Q,(Ao £), where o denotes the Hadamard (i.e., 
component-wise) product. That is, the observation is a matrix with entries A^j ■ M^j ■ £;j. 

The assumption of multiplicative noise is a necessary precaution in order for the presented esti- 
mator (and in fact, any estimator) for the missing entries to have bounded variance, as shown in 



Example 3.2 below. This is not, in practice, a restriction since an infinitesimal additive error 5A^j 
on an entry of A is equivalent to an infinitesimal multiplicative error SlogAj^ = 5A^j/A^j, and 
additive variances can be directly translated into multiplicative variances if the density function 
for the noise is knowrj^ The previous observation implies that the multiplicative noise model is 
as powerful as any additive one that allows bounded variance estimates. 



2.4 



^The zero-variance case corresponds to exact reconstruction, wliicli is handled already by Theorem : 
^The multiplicative noise assumption causes the observed entries and the true entries to have the same sign. The 
change of sign can be modeled by adding another multiplicative binary random variable in the model which takes 
values ±1; this adds an independent combinatorial problem for the estimation of the sign which can be done by 
maximum likelihood. In order to keep the exposition short and easy, we did not include this into the exposition. 



4 



Example 3.2 Consider the rank one matrix 



A = 



A 
A 



11 



12 



A. 



■21 



■22 



The unique equation between the entries isAiiA22 =^12^21- Solving for any entry will have another 
entry in the denominator, for example 



Thus we get an estimator for A^ when substituting observed and noisy entries for Ai2,A2i>^22- 
When A22 approaches zero, the estimation error for An approaches infinity. In particular, if the 
density function of the error £22 0/^22 ^ too dense around the value —A22, then the estimate for An 
given by the equation will have unbounded variance. In such a case, one can show that no estimator 
for All has bounded variance. 

3.2. Estimating entries and error bounds In this section, we construct the unbiased estimator 
for the entries of a rank-one-matrix with minimal variance. First, we define some notation to 
ease the exposition: 

Notations 3.3 We will denote by aij = logAij and e^j = logSjj the logarithmic entries and noise. 
Thus, for some path P in iC^ „ we obtain 



Denote by b^j = a^j + e^j the logarithmic (observed) entries, and B the (incomplete) matrix which 
has the (observed) b^j as entries. Denote by a^j = Var(b;j) = Var(£'jj). 

The components of the estimator will be built from the Lp : 

Lemma 3.4 Let G = G(M) be the graph of the mask M. Let x = (v, w) e iC^ „ be any edge with v 
red. Let P be an oriented patJfjin G(M) starting at v and ending at w. Then, 





ee£+(P) 



eeE+{P) ee£-(P) 



15 an unbiased estimator for with variance 




Proof: By linearity of expectation and centeredness of 6^, it follows that 



ee£+(P) ee£-(P) 



^If X e G, then P can also be the path consisting of the single edge e. 
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thus Lp(B) is unbiased. Since the are uncorrelated, the also are; thus, by Bienayme's 
formula, we obtain 

Var(Lp(B))= ^ Var(bJ+ ^ Var(bJ, 

ee£+(P) ee£-(P) 

and the statement follows from the definition of . 

In the following, we will consider the following parametric estimator as a candidate for esti- 
mating : 

Notations 3.5 Fix an edge x = (v, w) e iC^ „. Let "P be a basis for the set of all oriented paths 
starting at v and ending atw^ and denote #7 by p. For a e RP, set 



Xlia) = Y,(^pLpW. 



Furthermore, we will denote by 1 the n-vector of ones. 



The following Lemma follows immediately from Lemma 3.4 and Theorem 2.6 



Lemma 3.6 E(X(a)) = 1 a - b^; in particular, X(a) is an unbiased estimator for b^ if and only if 
l'^a = l. 

We will now show that minimizing the variance of X(a) can be formulated as a quadratic 
program with coefficients entirely determined by a^, the measurements b^ and the graph G(M). 
In particular, we will expose an explicit formula for the a minimizing the variance. Before stating 
the theorem, we define a suitable kernel: 

Definition 3.7 Let e e iC^^ be an edge. For an edge e and a path P, set c^ p = ±1 if e e £^(P) 
otherwise c^ p = 0. Let P, Q e J be any fixed oriented paths. Define the (weighted) path kernel 
k-.Vxy ^'m by 

KP,Q)= ^e,p-Ce,Q-Cre- 

Under our assumption that Var(bg) > for all e e K^ ^, the path kernel is positive definite, since 
it is a sum of p independent positive semi-definite functions; in particular, its kernel matrix has 
full rank. Here is the variance-minimizing unbiased estimator: 

Proposition 3.8 Let x = (5, t) be a pair of vertices, and CP a basis for the s-t path space in G with 
p elements. Let T, be the p x p kernel matrix of the path kernel with respect to the basis T. For any 

Var(X(a)) = a^Sa. 
Moreover, under the condition l^a = 1, the variance Var(X(a)) is minimized by 



^This is the set of words equal to the formal generators xjj, in the free abelian group generated by the x^, subject 
to the relations = for ^'1 cycles C in G U {(v, w)}. Independence can be taken as linear independence of the 
coefficient vectors of the Lr- 
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Proof: By inserting definitions, we obtain 

Pe7 eeif„,,„ 

Writing b = (bj e M'"" as vectors, and C = (c^p) e M^'^'"" as matrices, we obtain 

X(a) = b^Ca. 

By using that Var(A-) = A^Var(-) for any scalar A, and independence of the b^, an elementary 
calculation yields 

Var(X(a)) = a^Sa 

In order to determine the minimum of the variance in a, consider the Lagrangian 



L(a, A) = a'T.a + A 1 - ^ ap , 
V PeT J 



where the slack term models the condition = 1. An elementary calculation yields 

dL 

— = 2Sa-Al 
da 

where 1 is the vector of ones. Due to positive definiteness of S the function Var(X(a)) is convex, 
thus a = S~^l/l^S~^l will be the unique a minimizing the variance while satisfying l^a = 1. 
□ 

Remark 3.9 The above setup works in wider generality: (i) i/Var(bg) = is allowed and there is 
an s-t path of all zero variance edges, the path kernel becomes positive semi-definite; (ii) similarly 
if 7 is replaced with any set of paths at all, the same may occur In both cases, we may replace 
with the Moore-Penrose pseudo-inverse and the proposition still holds: (i) reduces to the exact 



reconstruction case of Theorem 2.4 (ii) produces the optimal estimator with respect to "P, which is 



optimal provided that J" is spanning, and adding paths to CP does not make the estimate worse. 

3.3. Rank 2 and higher An estimator for rank 2 and higher, together with a variance analysis, 
can be constructed similarly once all polynomials known which relate the entries under each 
other The main difficulty lies in the fact that these polynomials are not parameterized by cycles 
anymore, but specific subgraphs of G(M), see ( Kiraly et al4 2012 Section 2.5). Were these 



polynomials knovra, an estimator similar toX(a) as in Notation 3.5 could be constructed, and a 
subsequent variance (resp. perturbation) analysis performed. 

3.4. The algorithms In this section, we describe the two main algorithms which calculate the 
variance-minimizing estimate Aij for any fixed entry of an (m x n) matrix A, which is observed 
with noise, and the variance bound for the estimate Aj^. It is important to note that A;j does not 
necessarily need to be an entry which is missing in the observation, it can also be any entry which 
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Algorithm 1 Calculates path kernel S and a. 
Input: index (i,;), an (m x n) mask M, variances a. 
Output: path matrix C, path kernel S and minimizer a. 

1: Find a linearly independent set of paths CP in the graph G(M), starting from i and ending at 

2: Determine the matrix C = (c^ p) with e e G(M),P e 9; set c^ p = ±1 if e e £^(P), otherwise 
Ce,P = 0- 

3: Define a diagonal matrix S = diag(cr), with Sgg = (Jg for e e G(M). 
4: Compute the kernel matrix S = C^SC. 
5: Calculate a = S-H/IIS-^l 111. 
6: Output C, S and a. 



has been observed. In the latter case, Algorithm |3] will give an improved estimate of the observed 
entry, and Algorithm |4] will give the trustworthiness bound on this estimate. 

Since the the path matrix C, the path kernel matrix S, and the optimal a is required for 
both, we first describe Algorithm [l] which determines those. The steps of the algorithm follow 
the exposition in section [3l2| correctness follows from the statements presented there. The only 
task in Algorithm [l] that isn't straightforward is the computation of a linearly independent set 
of paths in step [ij We can do this time linear in the number of observed entries in the mask M 
with the following method. To keep the notational manageable, we will conflate formal sums of 
the Xg, cycles in Hi(G,Z) and their representations as vectors in M""", since there is no chance of 
confusion. We prove the correctness of Algorithm |2} 

Algorithm 2 Calculates a basis CP of the path space. 

Input: index (i,;), an (m x n) mask M. 

Output: a basis J" for the space of oriented i-j paths. 

1: If (t, j) is not an edge of M, and i and j are in different connected components, then 7 is 

empty. Output 0. 
2: Otherwise, if (i, j) is not an edge, of M, add a "dummy" copy. 
3: Compute a spanning forest f of M that does not contain (i, j), if possible. 
4: For each edge e e M \ F, compute the fundamental cycle Q of e in F. 
5: If (i,;) is an edge in M, output {— U {Q — X(-;j-) : e e M \ F}. 
6: Otherwise, let P^i j^ = q^j) - x^^^ jy Output {Q - P(ij) : e e M \ (F U {(i, ;)})}. 



Algorithms [3] and [4] then can make use of the calculated C,a, S to determine an estimate 
for any entry A^j and its minimum variance bound. The algorithms follow the exposition in 
Section 3.2 from where correctness follows; Algorithm [3] additionally provides treatment for the 
sign of the entries. 

Note that even if observations are not available. Algorithm |4] can be used to obtain the vari- 
ance bound. The variance bound is relative, due to its multiplicativity and can be used to ap- 
proximate absolute bounds when any reconstruction estimate A^j is available - which does not 
necessarily need to be the one from Algorithm [3] but can be the estimation result of any recon- 
struction. Namely, if a^j is the estimated variance of the log, we obtain an upper confidence 
bound (resp. deviation) bound Aj^ • exp (^^^j for Aj^, and a lower confidence bound (resp. de- 
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Algorithm 3 Estimates the entry a^j. 

Input: index (i,;), an (m x n) mask M, log-variances u, the partially observed and noisy matrix 
B. 

Output: The variance-minimizing estimate for Aj^. 
1: Calculate C and a w^ith Algorithm [TJ 

2: Store B as a vector b = (log |Bg|) and a sign vector s = (sgnBg) with e e G(M). 

3: Calculate A;j = ±exp ^b^Caj . The sign is -I- if each column of s^|C| (|.| component-wise) 

contains an odd number of entries —1, else — . 
4: Return Ajj. 



Algorithm 4 Determines the variance of the entry log(Ajj). 
Input: index (i, j), an (m x n) mask M, log-variances cr. 
Output: The variance lower bound for log(A;j). 

1: Calculate S and a with Algorithm [l| 

2: Return a^Sa. 



viation) bound A;j • exp corresponding to the log-confidence logAj^- ± -sj^ij- Also note 

that if Ajj is not reconstmctible from the mask M (i.e., if the edge (i,j) is not in the transitive 
closure of G(M), see Theorem |2.4|), then the deviation bounds will be infinite. 



4. Experiments 

4.1. Universal error estimates For three different masks, we calculated the predicted mini- 
mum variance for each entry of the mask. The multiplicative noise was assumed to be cjg = 1 
for each entry. Figure [l] shows the predicted a-priori minimum variances for each of the masks. 
Notice how the structure of the mask affects the expected error; known entries generally have 
least variance, while it is interesting to note that in general it is less than the starting variance 
of 1. I.e., tracking back through the paths can be successfully used even to denoise known en- 
tries. The particular structure of the mask is mirrored in the pattern of the predicted errors; a 
diffuse mask gives a similar error on each missing entry, while the more structured masks have 
structured error which is determined by combinatorial properties of the completion graph and 
the paths therein. 



4.2. Influence of noise level We generated 10 random mask of size 50 x 50 with 200 entries 
sampled uniformly and a random (50 x 50) matrix of rank one. The multiplicative noise was cho- 
sen entry-wise independent, with variance ct; = (t — 1)/10 for each entry. Figure 2(a) compares 
the Mean Squared Error (MSE) for three algorithms: Nuclear Norm (using the implementation 
Tomioka et al.| ( [2010| )), OptSpace ( Keshavan et al.| |2010| ), and Algorithm |5| It can be seen that 



on this particular mask. Algorithm |3] is competitive with the other methods and even outperforms 
them for low noise. 



4.3. Prediction of estimation errors The data are the same as in Section 



4.2 



as are the com- 



pared algorithm. Figure |2(b) compares the error of each of the methods with the variance 
predicted by Algorithm |4] each time the noise level changed. The figure shows that for any of 
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the algorithms, the mean of the actual error increases with the predicted error, showing that the 
error estimate is useful for a-priori prediction of the actual error - independently of the particular 
algorithm. Note that by construction of the data this statement holds in particular for entry-wise 
predictions. Furthermore, in quantitative comparison Algorithm |4] also outperforms the other 
two in each of the bins. 

5. Conclusion 

In this paper, we have introduced an algebraic combinatorics based method for reconstructing 
and denoising single entries of an incomplete and noisy matrix, and for calculating confidence 
bounds of single entry estimations for arbitrary algorithms. We have evaluated these methods 
against state-of-the art matrix completion methods. The results of section |4] show that our recon- 
struction method is competitive and that - for the first time - our variance estimate provides a 
reliable prediction of the error on each single entry which is an a-priori estimate, i.e., depending 
only on the noise model and the position of the knovra entries. Furthermore, our method allows 
to obtain the reconstruction and the error estimate for a single entry which existing methods 
are not capable of, possibly using only a small subset of neighboring entries - a property which 
makes our method unique and particularly attractive for application to large scale data. We thus 
argue that the investigation of the algebraic combinatorial properties of matrix completion, in 
particular in rank 2 and higher where these are not yet completely understood, is crucial for the 
future understanding and practical treatment of big data. 
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A. Correctness of Algorithm [2] 

We adopt the conventions of Section |2} so that G is a bipartite graph with m blue vertices, n red 
ones, and e edges oriented from blue to red. Recall the isomorphism, observed in the proof of 



Theorem 2.6 of the Z-group of the polynomials Lc(-) and the oriented cycle space Hi(G, Z). 

Define /3i(G) = e — n — m + c (the first Betti number of the graph). Some standard facts 
are that: (i) the rank of Hi(G,Z) is /3i(G); (ii) w^e can obtain a basis for Hi(G,Z) consisting 
only of simple cycles by picking any spanning forest f of G and then using as basis elements the 
fundamental cycles Cg of the edges e e £ \ f . This justifies step 4. 

Let (i,;) be an edge of G. Define an i-j to be the set of subgraphs such that, for generic rank 



one A, Lp(A) = — By Theorem 2.6 w^e can write these as Z-linear combinations of 
and oriented cycles. From this, we see that the rank of the path space is ^i(G) + 1 and the graph 
theoretic identification of elements in the path space with subgraphs that have even degree at 
every vertex except i and j. Thus, if (i, j) is an edge of G, step 5 is justified, completing the proof 
of correctness in this case. 

If was not an edge, step 1 guarantees that the dummy copy of (i, j) that we added is not 
in the spanning tree computed in step 3. Thus, the element P(-j = C(i j) — ^(i computed in step 
6 is a simple path from i to j. The collection of elements generated in step 6 is independent by 
the same fact in Hi[G U {(i, j)}, Z) and has rank ^i(G) + 1 and does not put a positive coefficient 
on the dummy generator X(^i jy □ 
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Figure 1 : The figure shows three pairs of masks and predicted variances. A pair consists of two 
adjacent squares. The left half is the mask which is depicted by red/blue heatmap with red entries 
known and blue unknown. The right half is a multicolor heatmap with color scale, showing 
the predicted variance of the completion. Variances were calculated by our implementation of 
Algorithm |4) 
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(a) mean squared errors 



(b) error vs. predicted variance 



Figure 2: For 10 randomly chosen masks and 50 x 50 true matrix, matrix completions were per- 
formed with Nuclear Norm (green), OptSpace (red), and Algorithm[3](blue) under multiplicative 
noise with variance increasing in increments of 0.1. For each completed entry, minimum vari- 
ances were predicted by Algorithm |4 2(a) shows the mean squared error of the three algorithms 
for each noise level, coded by the algorithms' respective colors. 2(b) shows a bin-plot of errors 
(y-axis) versus predicted variances (x-axis) for each of the three algorithms: for each completed 
entry, a pair (predicted error, true error) was calculated, predicted error being the predicted 
variance, and the actual prediction error measured as log abs of prediction minus log abs of true 
entry. Then, the points were binned into 11 bins with equal numbers of points. The figure shows 
the mean of the errors (second coordinate) of the value pairs with predicted variance (first coor- 
dinate) in each of the bins, the color corresponds to the particular algorithm; each group of bars 
is centered on the minimum value of the associated bin. 
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